allenkaci_LATE_92524_8341950_Progress Report 4.3.2020.pdf
- Name : Internship Organization : Mentor Name : Raj Shah Dates of 2-week period covered :
- Task 2 : Created dashboards that were presented to upper management of informative and descriptive statistics .
- A current system was in place , but it was not easy to categorize or distinguish between constituents .
- The current system was slightly modified to include all needed data , as well as the categories each component was labeled as .
- This ABF module was used to make daily , sometimes multiple-times a day updates to the tracker .
- This tracker was presented multiple times to upper management to give a summary of the ABF business unit .
- A database pulling from the system of record is being distributed and I am utilizing it constantly .
- Task 2 outcome : a few spreadsheet dashboards have been created to report on an ongoing basis what is being asked of ABF .
- A standardized tracking system including primary database keys that were previously not being used .
- I have been adding them to all ad-hoc reports to help cross-check and maintain an accurate and efficient tracking system .
almasriosama_94996_8338669_Report5_Almasri.pdf
- Name : Osama AlMasri Internship Organization : WestRock Company Mentor / Preceptor 's Name : Mitesh Patel Dates of 2-Week Period Covered : 3/23/2020-4/3/2020
- Current Tasks : My project will be within the Human Resources ( HR ) group in WestRock company .
- The fourth two weeks of the internship went slightly off track since that our team received some urgent analytics request from leaders in the organization .
- Tasks 3 was to bridge gap between statistics and business if model requires that .
- Task 4 ( added now ) is to work on the urgent requests received .
- Task 4 progress : We received a request from the Finance organization to build a dashboard that shows different employee and employer paid benefits and taxes in the previous two fiscal years .
- I then had to seek explanations from subject matter experts to identify which transactions were employer versus employee paid .
- I used QlikView to visualize the different aspects of the request : by business unit , by payer entity and by union status .
- We then received another request to build a visual analytics solution using Power BI .
- I took an aggregation of the data from QlikView , entered it into Power BI and built a dashboard that was then sent to the leadership .
bapatanjali_126445_8316575_Progress Report 5.pdf
- Name : Anjali Bapat Internship Organization : GoodRoads LLC Mentor ’ s Name : Chris Sunde Dates of first report covered : Mar 23rd- Apr 1st
- Task 2 : Creating Data pipeline to load data to Google cloud database and create route optimization .
- Specific steps / progress Task 1 : Cleaning the data collected from openstreetmap.. We want to onboard few cities but we could not find relevant data on internet .
- Even we talk to city authorities to give the data but sometime that does not work .
- We work only with city owned street , so it is main task to filter those streets by looking back and forth with QGIS map view and doing SQL filtering .
- Task 2 Creating Data pipeline to load data to Google cloud database and create route optimization .
- - Created data pipeline and routes for almost all the cities that was in our list .
- - Got a data subset of one type of maintenance for Matthews and will apply same to other
- Pulling up data from openstreetmap and looking for desired street types .
- Perform Linear regression to predict Ratings .
bartonronald_LATE_125819_8361870_Progress Report 5.pdf
- We also worked on code testing for the ULM rewrite , it was mostly a QA exercise to help the primary coder and report findings/issues
- As a refresher , to start the process I created a list of the columns right before the model runs and imported to csv .
- This is mostly manual work , looking at the variable names and description from a data dictionary and assigning a category .
- We ran python univariate scripts that generates an exposure over loss , premium , peril , etc .
- After having these univariate charts , which are created in excel from the script , you can compare similar variables and see if there are problems or bad data .
- Another example would be comparing two similar variables that are trying to show the same data , if one has a bunch of missing records the other is likely the better choice .
- Throughout the process we discovered various errors and presented them to the developer to fix and update the github code .
- We finished testing all the models by the Sprint Review and ironed out any issues in the new code !
- helps to have a team to do some dedicated testing , like me and the other intern in this case , so the developer can work on deploying changes quickly .
- Data science certainly requires some domain knowledge from within your industry as well , we had many meetings with business partners to get more information during our variable testing exercise .
cantyjeremiah_35429_8303916_Report5_Canty.pdf
- Name : Jeremiah Canty Internship Organization : UNC Charlotte Mentor / Preceptor 's Name : Dr. Doug Hauge Dates of 2-Week Period Covered : 3/21/2020 – 4/3/2020
- Task I : Learn more about certain techniques in Python to manipulate the visualizations to the type of display I want/need .
- Task I progress : I first had to research through Matplotlib and trial and error to configure the visualization to the specific results I wanted and to analyze and to display the data in the correct format that I wanted
- Have started drawings conclusions and interesting findings from the visualizations that will help us predict run times .
- I created two new charts to help explore the data quality differences amongst years .
- I also had to create visualizations that display the slope and intercept for each year and contrast them to draw conclusions .
- Task I outcomes : Manipulated the data to create more detailed visualizations .
- I have edited and created a chart that displays the regression of boy ’ s athletes separated by the running season year and put them on equal scales .
- Task III outcomes : I created insightful visualizations that display relationships .
- Using excel created charts that display the relationship between the various years regression line of intercepts to slopes to the progression of the athletes .
cardenasarturo_27332_8320477_Report5_Cardenas.pdf
- Name : Arturo Cardenas Internship Organization : APHI - Academy for Population Health Innovation Mentor / Preceptor 's Name : Dr. Michael Dulin Dates of 4-Week Period Covered : March 21st to April 3rd .
- Changing Projects , Due to Mecklenburg County closing offices to prevent Covid-19 Virus .
- After some conversations with Mr. Bynum , we concluded that it would be difficult to finish this project before end April .
- I asked Dr. Dulin permission to start working on a backup project using open source datasets .
- Use all available data to try to find new patterns in the data , I will use cluster Analysis , and also predict some dependent variables , possible to predict diabetes type I or Diabetes type II using logistic regression or random forest .
- Task III progress : The data science part , I have not started on this task yet , but I will start next week , I am planning to predict diabetes I or Diabetes II participants using logistic regression and random forest , Also , I am planning for Cluster Analysis .
- Task I outcomes : The data design and specs for the project , will do an entity relationship diagram .
- Task II outcomes : Design and build an easy to use R shiny dashboard , as illustrated in next page .
- Task III outcomes : Cluster Analysis on some of the variables , and logistic regression to predict medical conditions such as Diabetes I and II .
- ( b ) better to stay at home , to avoid contact with the contagious disease ( c ) it ’ s always good to have a backup project , even from open source datasets .
copeblake_23876_8339517_Progress Report 5.pdf
- Name : Blake Cope Internship Organization : Sports Business Journal Mentor ’ s Name : Derick Moss Dates of 2-Week Period covered : 3/23/20 – 4/3/20
- I am continuing my work with the Sports Consumer research data .
- The survey focuses on how sport fans viewing habits for each of the major professional leagues ( MLB , NBA , NFL , NHL , and MLS ) .
- For example I found that the most popular reason for interest in MLB was for fantasy/gambling , while the next two biggest reasons where fans who play video games , and interest in watching the highest level of play .
- I repeated this process and presented my findings to a couple of our data analysts .
- I found that five were the optimal number of features to use to model MLB interest but they accuracy was low so I may do some further data cleaning to improve .
- Presenting my work to some of the fellow data analysts made me feel a little
- Communicating results is definitely a skill I need to work on in order to accomplish my career goals so getting the opportunity to do so last week was great .
- This was my first time using a feature selection method , so I was glad to gain
- Continue to work towards creating a predictive model for this data .
demirelif_126087_8320437_Report5_Demir.pdf
- Progress Report 5
- Name : Elif Demir Internship Organization : Innovation Partners LLC Mentor Name : Ellen Jiao Dates of 2 Week Period Covered : 16 March – 2 April
- Current Task
- Due to Covid 19 , I wasn ’ t be able to make progress on my project for this time period .
- My mentor did not allow me to work remotely .
- Therefore , progress report 4 was my last report for the internship .
- In this period , I made progress on my internship presentation and report .
- Specific Steps | Progress
- Takeaways | Lessons Learned
- 2 Week Plan
duttaroma_117177_8339560_ProgressReport5_dutta.pdf
- Internship Organization : Wells Fargo
- Mentor / Preceptor 's Name : Subhabrata Mukherjee
- As part of my task , I have started with initial model designing .
- A research has also been performed in this period to check if we could leverage AWS services to build our model and subsequently deploy/host the same in cloud environment .
- Specific steps / progress
- Initial Model designing
- Feasibility study for cloud deployment
- No specific outcome to share for this week .
- Better understanding of Regularization A study on Pre-trained language models Concept of Pre-trained sentence vectors
- ▪ Model Building ▪ Testing ▪ Validation ▪ Convert model output to a tangible score ▪ Model Building ▪ Testing ▪ Validation ▪ Convert model output to a tangible score ▪ Create Visualizations ▪ Create necessary Documentation and upload the same in github/Bitbucket ▪ Final validation and Model tuning if required
gargdivya_LATE_116438_8371202_Progress Report 5.pdf
- Name : Divya Garg Internship Organization : Open Data Nation Mentor / Preceptor 's Name : Carey Anne Nadeau Dates of 2-Week Period Covered : 20th Mar- 3rd Apr
- These 2 weeks I worked on 1st part of State task 5 as this task is divided into 2 parts , i.e. matching road network to weather data and combining weather data to big crash data frame .
- For 1st part of this task , I used Google BigQuery to extract weather data using SQL queries to match it with roads , after which it was matched on Google Datalab by creating matching algorithm code .
- For State Task 5 , the goal was to extract weather data from google BigQuery and attach them
- chosen based upon the criterion set by the team .
- After extracting chosen station data , I wrote SQL queries to remove weather stations that had
- I was able to extract and clean all the required weather stations that will be needed for further analysis and was able to plot the same data .
- • Worked with Google BigQuery to extract data .
- • Worked with Google Datalab to clean , analyze and visualize required data .
- I will be working on part 2 of state task 5 , which is merging weather data with road network and crash files .
griderhansen_95247_8328777_Progress_Report_5_Grider.pdf
- Task 1 : Complete my first project deliverable which will be a Jupyter notebook containing my exploratory data analysis ( EDA ) for FINRA complaint code classification .
- Since classification is a relatively simple NLP task for which a bag of words ( BOW ) approach is often effective , I will try the most simplified techniques first .
- In theory , if I ’ ve processed the data sufficiently and if word frequencies are indicative of class , these associations should make intuitive sense .
- c. Look at the ratio of message length per document for each product/problem code – This will give us an idea of how much data ( information ) we have to train a classification model on for each category that we want to classify .
- I also studied the format of the communications and designed regex operations to successfully remove punctuation and newlines , replace numerals , and lowercase all tokens .
- The overall task outcome is obviously a polished EDA note book that will result in the text being properly preprocessed and analyzed so that I can proceed to the next phase of the project which is actually building the classification model ( s ) .
- Task 1 – After receiving the data I noticed that the text of each customer message appeared to be in written to the database in an email-like format .
- All these communications are collected by various representatives across the company and then passed to CCT as emails that they receive either through their Outlook inboxes or through a system called Unified Workflow ( UW ) .
- In fact , these things add noise and may need to be systematically removed via Python regular expressions in the text preprocessing phase .
- In looking at a sample of these subject lines , clients do tend to concisely ( if somewhat crudely ) state their problem here , as is good practice in any effective email communication .
gulleyalexander_117139_8330573_Report5_Gulley-1.pdf
- Name : Alexander Gulley Internship Organization : Ally Bank Mentor ’ s Name : Jiamin Lei March 23 – April 3
- - Continue to tune the model
- o Added in TBATs and Holt Winters Models into the mix o Added cross validation to the process to find the best model out of sample .
- forecast horizon of 3 and took quarter steps for a year out .
- - Met with stakeholders / subject matter experts
- o Discussed the model initial results with stakeholders .
- o Started outlining the paper and presentations .
- External Regressors does not work with R ’ s forecast ( version 8.5 ) package that drops columns where everything is zero .
- This is causing an issue during cross validation as some regressors are only present during the last year .
- Microsoft Open R ’ s MRAN has not been updated to the newest version of forecast that fixes this issue .
guptasmits_116219_8329708_Progress Report 5.pdf
- Your Name – Smitakshi Gupta ( Smits ) Internship Organization - Institute of Social Capital- UNCC
- • • • Mentor / Preceptor 's Name – Justin Lane • Dates of 3-Week Period Covered in the Progress Report – 03/28/2020 to 04/03/2020
- In the last period ( 24th to 28th March ) I looked closer into the economy worksheet .
- I wanted to add the Race column to see demographics requesting food stamps .
- Task 1 – I looked into the health worksheet for the Mecklenburg quality of life dataset .
- The data can be analyzed to see whether the numbers have increased over the period of time .
- Task I – I did not know anything about low cost health care system so I took time to research how it works .
- Task 1 – In some areas , the number of grocery stores have increased and in some , the grocery stores have just moved from one block to another .
- There has been a little increase in the health care system but not much was done in that segment .
- I am hoping to speak to my manager to clear some of my doubts and see how things go from there .
hakasmaggie_127066_8336414_ProgressReport5Hakas.pdf
- Name : Maggie Hakas Internship Organization : The Hartford Mentor / Preceptor 's Name : Heather Grebe/Lane Coonrod Dates of 2-Week Period Covered : 3/20-4/3
- This sprint went a lot smoother after finally getting settled into work from home .
- Task I progress : This project continued mine and Ron ’ s work together , as we ran the univariates and each took half of the plots to make our recommendations .
- After running the univariates , we looked at all the plots that were created and decided how we would recommend cleaning up variables .
- There were examples like having 7 variables for indicating if there was a pool or not , or that there is one outlier for a particular year that really skewed the data .
- We finished all the recommendations , and created excels linking to the univariates to make it easier for everyone who would need it .
- Task II outcomes : The recommendations have been passed off and accepted by multiple business partners .
- I have started research on how certain parameters are chosen in a grid search and how they interact .
- Currently , Ron and I are going through a smaller list of variables that are trustworthy , so the models can be run without interference from messy data .
- ( b ) Created a code file with object-oriented programming ( wasn ’ t a particular task , but learned it with Heather on the side )
kishorekumarsudha_95533_8336073_Progress Report_04032020.pdf
- Name : Sudha Kishorekumar Internship Organization : CVS Health Mentor Name : Lisa Klein Dates of 2-Week Period Covered : 03/23/2020 – 04/03/2020
- Review MS access database queries to understand the steps involved in generating the membership report for Annual HOQ.This is a mandatory step preceding to the HEDIS Submission .
- Membership data is retrieved from the Aetna Datawarehouse as of December 31st of the prior year .
- Once reviewed , membership data is then summarized based on the line of business and product .
- MS Access Database queries were analyzed to understand the steps involved in generating the Annual HOQ membership report due to lack of documentation .
- These entities were deemed as not eligible for 2020 HEDIS reporting by the Business Stakeholders .
- Rollup codes depends on the line of business and product , Commercial HMO products are summarized based on the Network service area , Commercial PPO products are summarized based on the state code , Medicare HMO and PPO products by H-Contracts and D-SNPs are summarized based on the H-Contract and Plan Benefit Package Identifier .
- Annual HOQ membership report was successfully migrated to SAS .
- Technical documentation was developed to outline the steps involved in generating this report .
- Due to the lack of documentation , this issue was discussed with the business stakeholders to determine the as of date .
laixinxin_121407_8327779_Report5_Lai.pdf
- In order to improve the efficiency of the model , the analyst decided to conduct PCA even though there are not too many dimensions in the origianl dataset .
- In such way , the executives can be convinced that even though there are a lof of variables in the data later , statistical models can still keep balanced between accuracy and efficiency .
- After comparing the summary of PCAs , analyst decided to choose the first 2 PCs and combine witht the categorical variables to create df2 , summrary of df2 is shown as below :
- It is still worth to conduct PCA in order to convince executives level that if we have many input features but not enough sample data , statistical model can still find a way to lift the accuracy .
- And it is intuitive to reach such conclusion because the company is a wholesale distributor of NC , SC and VA , which mean categorical variable REGION consists locations that are all very close to each other geographically speaking .
- are conceptually related while similar conclusion can also be applied to Frozen and Fresh variables .
- 2 ) PCA can be incredibly helpful when there are not enough data samples for a large amount of input features .
- In the model , analyst made the choice based on PCs ’ standard deviation which are greater than 1 .
- Moreover , PCA is a black box and lack of interpretability , because all the PCs are statistically independent and are linear combination of all original input features .
- 2 ) I will start to combine all the work done into PowerPoint and come up with a good story to present to the company as well as the school for the final .
paulkabita_126534_8315345_Progress Report 5.pdf
- During last two months , we concentrated on opportunity study , data collection and requirement gathering for our app .
- As our research is mainly focused on building Health recommendation system for patients with chronic pain and cancer , we collected relatable data sources online .
- We performed exploratory data analysis on patients with chronic pain who have gone through breast cancer surgery .
- Task II : Analyse collected data and build article recommender system .
- Task III : Create database schema structures for the system and build a web project as per the dashboard prototype .
- We collected health advice articles online to build our initial data base .
- However , as we are designing a new system , we faced ‘ cold start ’ problem while building our model which means we do not have enough data to find correlation between several users ’ choices .
- Task III progress : After analysing the system and basic requirements of recommender engine we came up with schema design with two tables .
- Task II outcomes : We did initial analysis and in process of building recommender engine .
- ( a ) Adding webpages in the web architecture ( b ) Come up with a basic article recommender engine for our health application ( c ) correlation analysis between variables of cancer patient dataset ( d ) Find more relatable data sources and analyse .
richterlyndsay_4676_8334820_Report5_Richter.pdf
- Name : Lyndsay Richter Internship Organization : UNC Charlotte Student Affairs Research and Assessment ( SARA ) Mentors : Dr. Erin Bentrim and Dr. Ellissa Brooks Nelson ( SARA ) Dates : March 21 – April 3
- Two weeks ago , I thought COVID-19 might be a relatively inconvenient but minor interruption to my full- time job and internship .
- The last two weeks have been consumed by my responsibilities as the lead campus communicator in the division of Student Affairs .
- • Working with Housing & Residence Life to execute communication activities related to March 17
- decision to ask students to vacate the residence halls by March 20 , and associated communication for an exemption process .
- Supervising my six direct reports and navigating their transition to remote and/or leave work environments .
- Then came the unexpected news this week of the County and State ’ s ask for our South Village resident halls for coronavirus support , requiring a massive communication and logistics effort currently underway to organize the retrieval of belongings and on-site move out process of approximately 2000 residential students .
- I am going to re-assess my internship progress this weekend and reach out to Professor Hague on Monday to discuss next steps .
- There is a third project that I would like to make progress on in April – but I realize it ’ s better to have a conversation sooner rather than later in the event of any further surprises .
- My internship supervisors are aware and supportive , and understand as they are managing their own professional and personal challenges .
sadikovibrokhim_125664_8334261_Report 5.pdf
- From last week , me and my team made considerable movement on research and strategy development .
- Also , now we need to prepare report for executive team what rapid analytical actions can be offered to clients in such pandemic time .
- We set up daily agile environment quick stand up calls every day to report our progress on tasks .
- I am working currently to propose innovative solution for optimizing debt collection .
- I have been going through several research studies which mainly focus on machine learning enabled dialing optimization to maximize recovery of bad debts and also I am doing some hands on analysis of different state of the art methodologies to generate delinquency scorecard for clients so that the customers can be more proactive to prevent possible defaulters .
- I have built Executive summary section already with interactive features using Shiny dashboard tool .
- As for the task one , we are challenged to come with rapid action plan to mitigate possible upcoming downturn in our client ’ s revenue generation .
- Therefore , my deliverable would be concise report of possible methodologies could be implemented to optimize debt collection using different approaches .
- It is actually cool , because I am challenging myself not only to do my assigned project but at the same time to be part of innovative hub .
- Set up couple of meeting with mangers , peers and executives Finalize report together with short presentation of my research on debt collection optimization Complete and review second tab section of dashboard
serapinzach_29510_8330435_Report5_Serapin.pdf
- Zachary Serapin Wells Fargo Roy Cano March 21 – April 2nd
- I initially wrote these statements in excel as it is easier to work with at the moment , but I have since gone back and translate the code to python to help with reproducibility .
- After flagging these I started to analyze the behavior of these models through summary tables and visualizations , but quickly realized I was suffering from information overload and tried to narrow the scope even more .
- I added more conditional statements to track specific models based on their importance to the network and their activity levels .
- My mentor pointed out that in order to correctly understand how new models are “ maturing ” in the network I need to configure the data using “ Month on Book ” .
- Even though I thought I completely worked through the problem ; I could have done a better job of trying to understand what I was after from beginning so I could have saved time .
- projects , I was able to follow a fairly straightforward process where I would scrape or take dataset and clean it in order to build a model .
- I think it ’ s important to plan ahead just as you would when developing a model , to keep on track of the objective and not go down various “ rabbit holes ”
- I can begin on creating a dashboard with visualizations and tables that help stakeholders draw easy assumptions about how new models act as they mature through the network .
- Additionally , I can drill down and see how a model ’ s influence on a the network is affected by particular attributes or how long it has been in use .
shuklabalya_1256_8335823_Progress Report 5 - Balya Shukla.pdf
- Name : Balya Shukla Internship Organization : Genpact Mentor / Preceptor 's Name : Kaushik Chavan Dates of 2-Week Period Covered : 03/30/2020 –04/03/2020
- This week we focused on creating a rapid action plan in light of COVID-19 specific to debt collection industry and continued doing training on Celonis .
- Specific steps / progress
- Task I progress : I am close to completing the training on Celonis .
- Task II progress : I spent the week doing research on various debt collection startups that are disrupting the industry through AI and their action plan during COVID-19 .
- Task I outcomes : The goal of this task is to use Celonis to perform machine learning .
- Task II outcomes : The goal of this task was to eventually take the action plan to Genpact ’ s collection department and execute the plan using their internal data .
- ( a ) Analytics can be used to optimize various business processes even during crisis .
- ( b ) Strategy planning can take longer than execution .
- ( a ) Complete the training modules and take the certification exam ( b ) Complete the strategy report to present to the debt collection department
singaravelmurali_110276_8317951_Report5_Singaravel.pdf
- Muralidharan Singaravel Student Id - 801 059 720 Spring 2020 DSBA Internship Progress Report 5
- Current Tasks : My internship project is titled “ Customer Complaint Analysis Using Machine Learning ” and the main objective of the project is to analyze customer complaint data and answers questions to leadership by identifying key and emerging trends , volumes , themes and insights which will help in root cause analytics development .
- For this two week my planned tasks are to develop and test models on various machine learning algorithms for classification - to classify consumer complaints into predefined categories and regression - to predict the reasons of customer complaints .
- The dependent variables for both the research questions are binary , so I used classification algorithms like Logistics regression , Decision trees and Naïve Bayes .
- As part of COVID-19 analysis project , I worked on designing the dashboard showing the growth of COVID- 19 related complaints by days and using geo maps in Tableau , showed the count of complaints by US states .
- Outcomes/ Takeaways –Identifying the features that results in a complaint closing with monetary relief is important and the bank can learn and fix the issues that resulted in such complaints therefore reducing the monetary relief .
- Muralidharan Singaravel Student Id - 801 059 720 Spring 2020 DSBA Internship Progress Report 5 customers .
- I used python for the above modelling and the output was a presentation to the senior leadership with the findings .
- Few insights from the COVID-19 Tableau dashboard are - the number of complaints peaked around mar 11 when WHO declared the COVID-19 outbreak as a global pandemic , California and Florida had the most number of complaints as panic spread among the bank customers .
- 2 Week Plan – For the next two weeks my task is to analyze unstructured customer complaints text data like summary and resolution comments and perform topic modeling , trends and key phrase extraction – it will consist of Sentiment Analysis , Text Clustering , Text Categorization , and Ontology Learning
summeykelsey_1592_8339397_Progress Report 5.pdf
- Assisting the Large Account Management c. Conduct predictive analytics
- d. Continue to work on visualizations in Power BI\ e. Prepare for our April Spring Meeting ( Online )
- My weekly meetings with my Vice President are to ensure I am staying on top of my
- This past week , my Vice President asked me to help LAM with some of their data initiatives and requests in various reports , so I have been assisting them .
- Being that everyone is working from home and only essential employees can be on our VPN at certain times , many of my meetings got cancelled .
- In our April Spring Meeting , I will be conducting a training for the entire team in
- pandemic allows me to stand with our Company ’ s tradition of coming together to help each other in times of need .
- Coronavirus has impacted my timeline completely , and I hope I ’ ll be able to get back on track by the presentation date .
- Train others c. Reach out to other people , even outside of your department/organization d. Run tests e. Practice makes perfect
- Conduct predictive analytics b. Edit presentation c. Adapt my duties as needed due to COVID-19
tomasikmarie_126529_8170274_Progress_Report_5.pdf
- Name : Internship Organization : Mentor 's Name : Dates of 2-Week Period Covered :
- Current Tasks Task 1 was to use the already built API to get BLS data for 2019 and test in the model .
- Task 2 was to gather data for additional hypotheses .
- Task 3 was to clean the data for the additional hypotheses .
- Task 4 was to help with a side project , doing cluster analysis for the engagement survey questions .
- Used the API I built a few months ago to gather data from 2019 .
- Using R , I did cluster analysis on the engagement survey questions to see if they should be grouped in a different way than they currently are .
- The new data being added did not make the local union membership significant , so the model remains the same .
- The cluster analysis came up with a slightly different grouping than is currently being used .
- ● Finalize updated model ● Try model with only non-union plant data ● Present updated models to stakeholders ● Feature selection for engagement survey project
vavilalasrivan_19848_8338716_Internship - Progress Report #5.pdf
- Name : Srivan Vavilala Company : Vishion Mentor : Gurtej Singh Work Period : Mar .
- At the moment , my work mainly involves performing exploratory analysis on datasets from numerous affiliates .
- The main concern now is getting the data cleaned in time to begin working on the modeling portion of the internship .
- My approach right now is to continue learning best NLP practices in order to make
- So far , I ’ ve manually looked through each of the datasets in order to clearly understand how best to combine them down the road .
- Mainly , there was a lot of time that I had to put into researching and much more over
- The tagging system idea that we ’ re aiming for is also much more flushed out .
- Always run your ideas by your supervisors when learning about a new area as they can
- NLTK is extremely versatile and will work well for the categorization project .
- Understand how the data needs to be further cleaned for the categorization problem
vegesnakovidh_31534_8339351_Report5_Vegesna.pdf
- Name : Internship Organization : Mentor / Preceptor 's Name : Dates of 2-Week Period Covered :
- wanted to because of other things I had to take care of and the current pandemic .
- There are some sections of the code which I am unsure of how it would apply to the problem we are trying to solve .
- Lo and her team to better understand the problem they are facing and clarify other questions along the way .
- The first task ( same as last week ) was testing some of the functionalities in the example code on a sample of the data provided by Dr .
- I looked for possible packages and libraries in R commonly used for this type of problem .
- The second task ( same as last week ) was organizing all the data files and code on Github .
- A major part of a project is making sure all the necessary files and documents are organized properly .
- I uploaded all the code , files , and data onto a separate branch on Github .
- The other thing is to work on getting the setup for the R package started .
xiachunqiu_116382_8317275_Report5_Xia.pdf
- Internship Organization : University of North Carolina , Belk College
- The current project is to explore the customers ’ behaviors in Yelp dataset .
- In addition , try to estimate TVEM and the coefficients of each independent variable .
- Last time , the linear mixed model included only one random effect which means only one independent variable has random effect .
- The coefficients of the random effect are changing varying over time .
- However , the coefficient is pretty small and does not have a lot of impacts on the dependent variables .
- I also draw the graphs between years and the coefficients of random effects , but the changes over time are not large .
- I don ’ t know whether it is a good phenomenon to our data , therefore , I need to do more research about how these random effects affect the dependent variable .
- Currently , I don ’ t think I did a good job on this model via our Yelp dataset .
- The meaning of the outcome is not clear , I need to read more examples to understand the whole model and apply to our dataset .
xiaodiwen_117612_8337803_Report5_Xiao.pdf
- Name : Diwen Xiao Internship Organization : UNC Charlotte Mentor / Preceptor ’ s Name : Professor Ming Chen Dates of 2-Week Period Covered in the Progress Report : 03/23/2020-04/03/2020
- Task I : Conduct model estimation process by using temporal dynamic model with both software package R. Task II : Draw different types of figures for detected area of interests of the tested volunteers .
- Specific Steps/Progress Task I : For the model estimation process using temporal dynamic model , first I have researched some related researches using temporal network analysis .
- Then , I have learned temporal social network analysis tutorial and download related packages .
- Task II : for the matching and synchronize eye-fixation process , firstly , I have draw different types of figures in the professional software for different Area of Interests that people may interest with , such as brands , price , and text .
- Outcomes Task I outcomes : I have obtained an image from R for the temporal social network analysis , which is a static visualization of the temporal social network analysis to show every workshop and collaboration from the previously field experiment .
- Task II outcomes : I have drawn different types of figures in the professional software for different AOIs .
- Takeaways/Lessons-Learned 1 ) Learned and understand how temporal dynamic model process , and how to use related packages in analysis software R. 2 ) Learned the main goals of temporal dynamic analysis .
- 3 ) Conduct the robustness check based on the model estimation results , which is to check the core and the most important regression coefficient estimates behaves , so that can check the model validation to make sure both the data and the result are reliable .
- 4 ) Try to use other classifiers such as ANN and Random Forest for model evaluation process if the time is enough .